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ABSTRACT 

To reduce the cost of item writing and to enhance the 
flexibility of item presentation, items can be generated by item-cloning 
techniques. An important consequence of cloning is that it may cause 
variability on the item parameters. Therefore, a multilevel item response 
model is presented in which it is assumed that the item parameters of a three- 
parameter logistic model describing response behavior are sampled from a 
multivariate normal distribution associated with a parent item. In this 
approach to item calibration, only distributions of item parameters are 
estimated. Therefore, the savings in item calibration costs for the item 
cloning model are potentially enormous. A marginal maximum likelihood and a 
Bayesian item- calibration procedure are formulated. Further, a two-stage item 
selection procedure for computerized adaptive testing is . presented. First, a 
set of items cloned from the same parent item is selected to be optimal at the 
ability estimate. Second, a random item from this set is administered. 
Simulation studies illustrate the accuracy of the item pool calibration and 
ability estimation procedures. An appendix describes Bayes model estimates for 
the item cloning model. (Contains 21 references.) (Author/SLD) 
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Abstract 

To reduce the cost of item writing and to enhance the flexibility of item presentation, 
items can be generated by item-cloning techniques. An important consequence of cloning 
is that it may cause variability on the item parameters. Therefore, a multilevel item 
response model is presented were it is assumed that the item parameters of a 3-parameter 
logistic model describing response behavior are sampled from a multivariate normal 
distribution associated with a parent item. In the present approach to item calibration, only 
distributions of item parameters are estimated. Therefore, the savings in item calibration 
costs for the item cloning model are potentially enormous. A marginal maximum 
likelihood and a Bayesian item calibration procedure are formulated. Further, a two- 
stage item selection procedure for computerized adaptive testing is presented: First, a set 
of items cloned from the same parent item is selected to be optimal at the ability estimate. 
Second, a random item from this set is administered. Simulation studies illustrate the 
accuracy of the item pool calibration and ability estimation procedures. 

Keywords: computerized adaptive testing, item clones, item shells, multilevel item 
response theory, marginal maximum likelihood, Bayesian item selection. 
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Introduction 

A major impediment to cost-effective implementation of computerized adaptive 
testing (CAT) is the amount of resources needed for item pool development. One of the 
solutions to the problem currently pursued is generating pools of items by using item- 
cloning techniques. Early pioneers of this idea were Bormuth (1970), Hively, Patterson 
and Page (1968) and Osbum (1968). Common to their approaches is a formal description 
of a set of ’’parent items” along with algorithms to derive a larger set of operational items 
from them. These parent items are known as ’’item forms”, ’’item templates”, or ’’item 
shells”, whereas the items generated from them are now widely known as ’’item clones”. 
We will use the term ’’parent item” to denote both the initial item and the set of clones 
generated from it. 

Parent items may take the form of a syntactic description of a test item with one or 
more variable places for which substitution sets are specified. Clones are then generated 
by random draws from the substitution sets. In these “replacement set procedures” 
(Millman & Westman, 1989) the computer puts the answers to multiple-choice items in 
random order, picks distractors from a list of possible wrong answers, and, in numerical 
problems, substitutes random numbers in a specific spot in the item stem and adjusts 
the alternatives accordingly. Parent items may also consists of intact items from which 
clones are generated using transformation rules. Examples of such rules are linguistic 
rules that transform one verbal item into others, geometric rules that present objects 
from a different angle for spatial ability testing, transformations that allow one molecular 
structure to be derived from another in testing of knowledge of organic chemistry, or rules 
from proposition logic that generate items for testing of the ability in analytic reasoning. 
Comprehensive reviews of the technology of item cloning are given in Bejar (1993) and 
Roid and Haladyna (1982). 

An important question is whether clones from the same parent item have comparable 
statistical characteristics. If they do, important savings in the costs of item pool calibration 
are possible, because it would then suffice to calibrate the characteristics of the parent 
only. In an extreme case, one might assume that the item parameters are constant over 
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the clones derived from the same parent. Empirical studies addressing this question are 
reported in, for example, Hively, Patterson and Page (1968), Macready (1983), Macready 
and Merwin (1973) and Meisner, Luecht and Reckase (1993). The general impression 
from these studies is that the variability between clones from the same parent is much 
smaller than between parents, but not small enough to justify the assumption of identical 
values. Of course, the size of the remaining variability depends on various factors, such as 
the type of knowledge or skill tested and the implementation of the item cloning technique. 

The current paper is based on the expectation that attempts to improve item cloning 
techniques are desirable but that some degree of within-parent variability will always 
remain. The best way to deal with this variability is not to ignore it, but to model the 
distribution of the item parameters and allow for the uncertainty about their individual 
values when selecting the adaptive test. 

A design for adaptive testing that fits in naturally with this approach is one with 
item selection based on stratified or two-staged sampling of items from the pool. In 
this sampling design, each item is selected in the following two steps: (1) A parent is 
selected from the pool with a set of clones that is optimal at the current ability estimate 
of the examinee; (2) A clone is randomly sampled from the set and administered to the 
examinee. This design capitalizes on the statistical advantage of administering tests with 
items adapted to the examinee’s ability but, as will be discussed below, due the random 
sampling in the second step, also saves an important part of the resources needed for item 
calibration in regular CAT. 

The proposed sampling design leads to a two-level item response theory (IRT) 
approach-with a lower level at which item clones are represented by a three-parameter 
logistic (3PL) model and a higher level at which the item parameters in this model are 
random with a (joint) distribution that represents within-parent variability. To capture 
between-parent variability in item parameter values, these distributions are allowed to 
vary in location and variance. 

In the model below, the distributions of the item parameters for the parents are 
characterized by nine hyperparameters each. The values for these hyperparameters are 
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estimated from a data set where, for each examinee in the sample, one clone sampled 
from its parent. Because sampling is at random, the fact that the responses to the other 
clones from the same parent are missing can be ignored. Estimating nine hyperparameters 
per parent is the equivalent of calibrating three items under the 3PL model. Since item- 
cloning techniques easily lead to much large numbers of clones per parent, the savings in 
the resources needed when collecting calibration data are potentially enormous. 

When selecting parent items in the first stage of the item-selection procedure, we 
have to cope with distributions rather than individual values for the item parameters. 
An obvious solution is to base selection of the parents on a Bayesian criterion with 
the distribution of the item parameters averaged out. The result is a reduction in the 
accuracy of ability estimation. Numerical examples of this reduction are shown in the 
empirical examples presented below, both relative to the case of regular CAT from a pool 
of individual items and a pool of cloned items calibrated under the regular 3PL model. 

It is instructive to observe how the proposed type of adaptive testing can be viewed 
as an intermediate case of (1) classical domain-referenced testing under a binomial model 
(e.g.. Lord & Novick, chap. 23) and (2) regular CAT from a pool of individual items. This 
type of CAT shares the idea of random item selection with the former and optimal selection 
at ability estimates with the latter. If all variability between the item-parameter values is 
within the parents, it is identical to domain-referenced testing. If all variability is between 
the parents, it is identical to CAT from a pool of individually written and calibrated items. 
However, if item cloning is effective, much smaller within-parent than between-parent 
variability is expected, and the proposed type of adaptive testing has efficiency close to 
regular CAT. 

From a practical point of view it often is necessary to have test specialist review items 
generated by cloning algorithms before they are administered. The necessity of review 
becomes more crucial if (1) the domain of knowledge or skills contains socially sensitive 
material and (2) the algorithms can not be fully trusted. However, from a statistical point 
of view, it does not make much difference if in the second stage of item selection clones 
are drawn randomly from large sets of items generated and reviewed earlier that are stored 
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physically in computer memory or if they are generated on the fly by computer algorithms 
with a random seed. In either case the critical assumption of random sampling is met, and 
sampling is from approximately the same parameter distributions. 



Model 



Consider an item pool generated from parents p — 1, ..., P. The clones from parent 
p will be labeled i p = 1, I p . The first-level model is the 3PL model, which describes 
the probability of success on item i p as 



p ip (9) = Pr{X ip = l} = Cip + (l 



v exp[oi p (fl-fr ip )] 
p) l + exp[a ip (0 - b ip )Y 



( 1 ) 



where X ip is the response variable for item i p , with X ip = 1 for a correct and X ip = 0 for 
an incorrect response. The values of the item parameters (ai a ,b is ,Ci g ) are realizations of 
a random vector. The second-level model describes the distribution of this vector through 
the transformation 



£i p = ( lo § a i P , K , logit Ci p ) (2) 

with a multivariate normal distribution 

« ip ~ N(p p , S p ), (3) 

where p p is the vector with the mean values of the item parameters for parent p and 
S p their covariance matrix. The transformation in (2) is introduced to give the item 
parameters scales for which the assumption of multivariate normality in (3) is reasonable. 

In the calibration and item selection procedures below, we will assume that 6 has a 
standard normal prior distribution, that is, 




Q~N{ 0,1). 



(4) 
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This assumption holds if j is from a population of exchangeable examinees with a normal 
distribution of abilities. 

The model presented in (l)-(4) has several relatives. The multilevel IKT models for 
testlets in Bradlow, Wainer & Wang (1999) and Wainer, Bradlow, and Zu (2000) differ 
fro the present model in having a random component for difficulty parameter bi but fixed 
parameters a* and q. The random component is used to allow for dependence between 
responses to fixed items in the same testlet. Because our items are randomly sampled 
from parents, all item parameters need to be random and dependence between responses 
to items from the same parent is captured by the covariance matrix in (3). The present 
model also differ from the one in Albers, Does, Imbos and Jansen (1989) and Janssen, 
TUerlinckx, Meulders and de Boeck (2000) who also assume item sampling but model the 
process by a version of the 1PL model with a random difficulty parameter. 

Item Pool Calibration 

In the present approach, item pool calibration amounts to estimation of the values 
for each parent of the hyperparameters in the distribution in (3). It is assumed that 
these parameters are stacked in a vector rj = (fi v Ei, ..., /i P) £p). The values of these 
parameters can be estimated by the methods of marginal maximum likelihood (MML) or 
Bayes modal estimation (MAP). 

The response vector of examinee j is denoted as Xj = (x ip j) = (xi 1 j,...,Xi P j), where 
i p is item clone i randomly drawn from parent p. As already noted, estimation of vector rj 
is from a data set with for each examinee j the responses to one item clone sampled from 
its parent. Because the responses to the other item clones are missing at random, they can 
be ignored. In practice, the adaptive nature of the test will also involve sets of calibration 
data with examinees missing parents. These data are missing at random too. However, to 
save unnecessary complexity, our notation will not make this incompleteness explicit. 
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MML Calibration 

In MML estimation, a distinction in made between structural and nuisance 
parameters. The structural parameters are estimated from a log-likelihood marginalized 
with respect to the nuisance parameters. In the present case, the structural parameters 
are in the vector 77 , whereas the nuisance parameters are the ability parameters 9 and the 
random item parameters £ ip . These nuisance parameters are supposed to be stacked in 
vectors 0 and £, respectively. 

The marginal probability of observing response pattern is given by 



p(*j] V) = J - J Pfa I 0 , £)/(£, 0 1 

= / - / II I &p)M&> P , s P )<t>(m ip d0 



= j n /•••/ pfcpj 1 e ’Si P ) h (Si P \»p’'2p) d £i P 



<fi{9)de. 



(5) 

( 6 ) 
(7) 



The marginal log-likelihood of 77 is given by 



logL( » 7 ;x ) = lo SP( »7)- ( 8 ) 

j 

The marginal likelihood equations for 77 can be easily derived using Fisher’s identity 
(Efron, 1977; Louis 1982). The first-order derivatives with respect to 77 can be written as 

^ log L(rj\ x) = £ E( ^ log/j(£, 0 i| 77 ) | x^, 77 ) = 0, (9) 

where log/,(£, 9j\rj) is the so-called “complete data” log-likelihood 



log /(&,,%) = 



E !ogp(xi,j I Q.t.,,) + E |o gp(£i,M + log W). 

P P 
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and the expectation is with respect to the conditional posterior density for the nuisance 
parameters, that is, with respect to 



p{£i p ,Q I Xj»*?) a IJptepi I 0>£i P M£i P l/V S pW)- (1°) 

p 

It follows that the likelihood equations are given by 

/V = £*(^ I x i> 7 ?)» (11) 

3 

° 2 jm = £ £(& I x j> »7) - Mpu' (12) 

3 

and 



&puv ^ ^ ■^'(£pu£pu X 7> ^?) PpuPpv, (13) 

3 

where indices it and t; 7 ^ u denote the itth and vth element in the parameter vectors. These 
equations can be solved using an EM or Newton-Raphson algorithm. 

Computation of the standard errors of the parameters estimates is a straightforward 
generalization of the method for the 3PL model presented in Glas (2000). These estimates 
are found upon inverting the approximate information matrix 



H( 17 , Tf) ~ ^£ E 



Q 

Q-logfj(£,O j \fji a} 'E 3 ) | xj, r) 



E 



Q 

Q-logf j (£,0 j \v p ,'2p) | x j; 77 



Bayes Modal Calibration 

The use of Bayes modal estimation can be motivated by the fact that the parameters 
in the 3PL model are sometimes hard to estimate because they are poorly determined by 
the available data. In such instances, the behavior of the item response functions over the 
region of the ability scale where the respondents are located can be described by different 
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combinations of parameter values. As a result, the estimates of the parameters in the 3PL 
model are highly correlated. Adding a covariance matrix for every parent may worsen 
the identifiability of the model for such data sets. 

To obtain “reasonable”, finite estimates, Mislevy (1986) considered a number of 
Bayesian approaches. Each of them entails the introduction of prior distributions on 
the item parameters. Parameter estimates are computed maximizing the log-posterior 
density of 77 , which is proportional to logL( 77 ; x) + logp( 77 | £) + logp(C), where 
p( 77 I 0 is the prior density of the 77 , characterized by parameters £, which in turn 
follow a density p( £). In one approach, the prior distribution p( rj | £) is postulated by 
fixed the item calibrator; in another, often labeled empirical Bayes, the parameters of the 
prior distribution are estimated along with the other parameters, for example, as the modes 
of their posterior distribution. In our case, the second approach is formally identical to 
Bayes modal or maximum a posterior (MAP) estimation of the parent parameters, albeit 
that the estimates have to be found for all parents simultaneously. The approach involves 
a change of the likelihood equations to dlogL(rf \ x.)drf+d log p(rj \ x)dr) = 0, while 
simultaneously the equations cHogp( 77 | C)/dC+<91ogp(C)/<9C = 0 must be solved. An 
outline of the procedure for the current item cloning model is given in Appendix A. 

Discussion 

The assumption that all respondents are drawn from one population can be replaced 
by the assumption of multiple populations of respondents each with a normal ability 
distribution indexed by a unique mean and variance parameter. Bock and Zimowski 
(1997) point out that this generalization, together with the possibility of analyzing 
incomplete item-calibration designs, provides a unified approach to such problems 
as differential item functioning, item parameter drift, non-equivalent groups equating, 
vertical equating and matrix-sampled educational assessment. Though not illustrated 
here, calibration under the item-cloning model can also be extended to fit this framework. 
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Adaptive Selection of Parent Items 



Our initial estimate of the ability of examinee j is the prior distribution in (4), which 
has a density denoted as Suppose parents 1 , k — 1 have been selected. For each 
parent a clone has been administered, the responses to which are denoted by a vector 
xf _1) = (xji, Xj(fc_i)). Then the posterior distribution of 8 given x( fc ~^ is 



p(8 






) « 



fc-i . 

4 > (8) I p(x jp 
p=i J 






p ■ 



(14) 



The variance of this posterior distribution is denoted as Var(0|x^ c ~ 1 ' > ). 

The fcth parent should be selected to be optimal at this posterior distribution. Several 
Bayesian criteria of optimality have been suggested; for studies of several old and new 
criteria, see van der Linden (1998) and van der Linden and Pash ley (2000). The one used 
in the computer simulations below is the criterion of minimum expected posterior variance 
adapted for use with the item-cloning model. It selects the kth parent to have minimum 
posterior variance averaged both over the set of clones associated with the parent and the 
responses to the clones predicted from the examinees current ability estimate. 

If parent p in the pool would be selected as the kth parent in the test, the posterior 
predictive distribution of the response of examinee j to a random item from this parent 
given the previous responses x( fc_1 ^ is given by 



/(* 






3Pk 



ill 



P(*m |0>£pJP(£jJ*V’ S » 



R 



Pk 



p{8 | xf _1) )d0. (15) 



Note that the probability of the response is first averaged over the distribution of the 
item parameters for parent pk and then over the posterior distribution of the ability of the 
examinee. 

The two possible responses lead to updates of the posterior variance which we denote 
as W(0|x( fc-1 \ Xj Pk = 0) and Var(0|x( fc-1 \ Xj Pk = 1). The proposed criterion for the 
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selection of the /cth parent is the expected value of this update. That is, 
p k = arg min r X jTk = 0)/(0 | x^ _1) ) 

+Var(d\xf~ 1) ,X jrk = 1)/(1 | xf-^re i?*} , 



(16) 



where Rk is the set of parents in the pool from which the kth parent is chosen. 

Simulation Studies 

T\vo simulation studies were conducted. One study was to address the accuracy of 
the MML calibration procedure for the item cloning model in (11)-(13) under a variety 
of conditions. The other to address the accuracy of the ability estimator from the item 
selection procedure based on the criterion in (16) under the same conditions. 

Three different types of CAT were studied, namely CAT from a pool of: 

(1) cloned items calibrated and administered under the item cloning model; 

(2) individual items calibrated and administered under the regular 3PL model; 

(3) cloned items calibrated and administered under the regular 3PL model. 

The comparison between Typel and Type 2 helps us to identify the potential loss in 
accuracy due to second-stage item sampling and the presence of random item parameters 
in the item cloning model. The comparison between Type 1 and Type 3 shows us 
the statistical consequences of adaptive testing from a pool of cloned items under a 
conventional model that ignores the dependences between responses to items cloned from 
the same parent. These dependences are created by the fact that such items share certain 
structural features and attributes. The regular 3PL model in Type 3 CAT does not allow 
for such dependences, whereas the multilevel IRT model in (l)-(3) for Type 1 CAT does. 

Items 

Because the composition of the item pool can have a substantial impact on item 
calibration and ability estimation results in CAT, the items used in each of the three 
types of CAT were generated using a common multivariate normal distribution for the 
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(transformed) item parameters (log a, b, logit c), with mean 



Mo = (-0, .0, logit(.2)) 



(17) 



and covariance matrix 



0.20 


0.05 


-0.05 


0.05 


1.00 


0.10 


-0.05 


0.10 


0.10 



(18) 



Item pools with a cloning structure were obtained by sampling the values for the 
vector of means of the distribution of the item parameters for each parent in (3), /z p , 
from (17)-(18). The covariance matrices of these distributions were all equated to the 
matrix in (18); that is, E p was set equal to So for all p. Pools with individual items were 
obtained sampling their true item parameter values from the distribution in (17)-(18). To 
approximate the composition of the previous type of pool as closely as possible, the pools 
were refreshed for each replication. 



Calibration 

In this simulation study, the following additional variables were manipulated: 

(1) test length: n= 20, 30 and 40 items; 

(2) sample size: JV=100, 400 and 1,000 examinees. 

For each condition, N examinees were simulated, drawing random values for 6 
from the standard normal distribution. The mean absolute error in the estimates of the 
parameter in the item cloning model (Type 1 CAT) or the 3PL model used to calibrate the 
item pools (Type 2 CAT) are shown in Table 1. 



Insert Table 1 about here 



The pattern in the errors for the two models are approximately the same. As expected, 
the errors decreased both in the size of the sample and the length of the test, and generally 
larger errors were obtained for the discrimination than for the difficulty parameters. The 
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last three columns show the differences in mean absolute error between the parameters 
estimates for the two models. The differences between the errors in the estimates of the 
guessing parameter are negligible. The differences between the errors in the estimates of 
the difficulty and discrimination parameters are small but, as expected, systematically in 
favor of those for the regular CAT model. (Observe that these two sets of parameters are 
on identical scales but have distributions true values that show random differences. For 
the given item pool size, the effect on the comparison in Table 1 can be assumed to be 
negligible, though.) 

In Table 2, the same comparison is made for the parameters estimates for a pool of 
cloned items calibrated under the item cloning model in this paper ("type 1 CAT) and a 
regular 3PL model that ignores the item cloning structure (Type 3 CAT). The differences 
are generally larger than in the previous comparison. 



Insert Table 2 about here 



The covariance matrix in (18) could be estimated only for calibration under the item 
cloning model. The mean absolute estimation errors are given in Table 3. Observe that the 
errors in the estimates of the variances decrease both in the sample size and the test length 
but that the decrease is negligible for the estimates of the covariances. Generally, but not 
unanticipated, estimation of the covariance matrix appeared to be much less accurate than 
estimation of the vector of means of the parameters in the item cloning model. 



Insert Table 3 about here 



Ability Estimation 

The same three types of CAT as in the calibration study were studied. The size of the 
pool was always equal to 400. The final ability estimates in Type 1 CAT were calculated 
as the expected value of the posterior distribution (EAP estimate) in (14). In the other 
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two types of CAT, EAP estimates under the regular 3PL model with the prior in (4) were 
calculated. 

The following additional variables were manipulated: 

(1) test length: n= 20, 30 and 40 items. 

(2) true ability value: 9= -2.0. -1.0, 0.0, 1.0, and 2.0, and 8 ~ N(0 , 1). 

For each condition, 400 examinees were simulated. The item parameters were 
redrawn for every simulee. The mean absolute errors in the ability estimates are shown in 
Table 4. The comparison between the errors for Type 1 and Type 2 CAT shows the price in 
efficiency to be paid for item cloning with second-stage random sampling of clones from 
parent items. The differences were negligible for 9 values close to zero but increased 
toward the tails of the 9 distribution. This change is due to the use of the standard normal 
prior in (4) which favors item selection near 9 = 0 at the beginning of the test for both 
types of CAT. The comparison between Type 3 and Type 1 CAT shows the additional loss 
of accuracy if the dependencies between responses to items cloned from the same parent 
is not modeled. These differences were negligible for 9 values close to zero but again 
increased toward the tails of the 9 distribution. The average error across sampling of 
examinees from a standard normal population showed the same pattern but with smaller 
values. Also, both series of differences showed a tendency to decrease in the length of 
the test, albeit the tendency was smaller for the types of CAT with item cloning than for 
regular CAT. 



Insert Table 4 about here 



Conclusion 

The advantage of CAT with item cloning is a potentially large reduction in the 
resources needed for item pool development. The price to be paid for this advantage 
is a reduction in the accuracy of the ability estimates. For the typical test length in the 
current adaptive testing programs of n = 30, the decrease in the average accuracy of 
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ability estimation across a normal population of examinees was slightly over 10% for the 
multilevel model in this paper. The decrease can easily be compensated by added 2-4 
items to the test. It is left to the testing agent to decide if the trade-off by the reduction in 
item pool development costs and test length is advantageous. 
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Appendix A: Bayes Model Estimates for the Item Cloning Model 

The marginal probability of observing response pattern Xj is enhanced with a 
conjugate prior p(/r p , £ p | fi 0 , S 0 ). The conjugate prior distribution for /x p and £ p is 
a product of a normal and an inverse- Wishart distribution (see, for instance, Box & Tiao, 
1973). The marginal probability of examinee j's response vector now becomes 

... [ IIp(-> I <*• Ep)P(*V Sp I Mo, So 

p 

(A.l) 

Consider the complete data specification 

p(x, t,e i /x, s) = nn^ P 1 0- tp)p(€jp i EpM/v s p i M 0 > (i9) 

3 p 

The factors 




nrip«jp lM P ,S p )p(/i p ,Sp | ^0) So) 

j p 

entail a normal model with a normal-inverse- Wishart prior, with parameters, pt 0 and S 0 , 
v 0 the degrees of freedom for the prior of £ p and kq the degrees of freedom for fi 0 . Then 
the posterior is also inverse- Wishart distributed with parameters 

Vs = + 

v = v 0 + n 
K = Ko + n 

= S p + ^+^(£ p — /^o)(£p — ^o) T + 
where S p = £(£ p - £p)(£ p - l p ) T - 

3 = 1 

As can be verified in (9), the likelihood equations are the posterior expectations of 
the first-order derivatives of the complete data likelihood. Analogous to (11)-(13), we 
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now have 



V 



p 



1 

Kq n 



'£,E(t.\x i ,v) + 



K 0 

— —Vo, 

k 0 + n 



(A.3) 



and 



Sp = £ E t«p - - h) T l x i'>j] + — ^5— (Cp - Mold, - Pof + So, (A-4) 

n<0 T 
3 

with 

«? = !>« !*<,»»)■ 




o o 
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Table 1 

Mean absolute error in item parameter estimates 



n 


N 


a 


Type 1 

0 


7 


a 


Type 2 
0 


7 


a 


1-2 

0 


7 


20 


100 


.312 


.395 


.061 


.304 


.385 


.060 


.008 


.010 


.001 




400 


.238 


.292 


.056 


.219 


.278 


.057 


.019 


.014 


-.001 




1000 


.201 


.241 


.051 


.172 


.222 


.051 


.029 


.019 


.000 


30 


100 


.306 


.384 


.059 


.295 


.369 


.059 


.011 


.015 


.000 




400 


.228 


.281 


.054 


.213 


.261 


.054 


.015 


.020 


.000 




1000 


.189 


.251 


.052 


.165 


.230 


.052 


.024 


.021 


.000 


40 


100 


.299 


.384 


.060 


.287 


.374 


.060 


.012 


.010 


.000 




400 


.222 


.286 


.055 


.202 


.264 


.055 


.020 


.022 


.000 




1000 


.189 


.229 


.051 


.161 


.219 


.050 


.028 


.010 


.001 
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Table 2 



Mean absolute error in item parameter estimates 



n 


N 


a 


Type 3 

P 


7 


a 


Type 1 

P 


7 


a 


3-1 

P 


7 


20 


100 


.353 


.409 


.061 


.312 


.395 


.061 


.041 


.014 


.000 




400 


.277 


.310 


.054 


.238 


.292 


.056 


.039 


.018 


-.002 




1000 


.244 


.275 


.051 


.201 


.241 


.051 


.043 


.034 


.000 


30 


100 


.321 


.407 


.058 


.306 


.384 


.059 


.015 


.023 


-.001 




400 


.259 


.303 


.054 


.228 


.281 


.054 


.029 


.022 


.000 




1000 


.241 


.277 


.052 


.189 


.251 


.052 


.052 


.026 


.000 


40 


100 


.321 


.400 


.057 


.299 


.384 


.060 


.022 


.016 


-.003 




400 


.257 


.303 


.054 


.222 


.286 


.055 


0.35 


.017 


-.001 




1000 


.238 


.277 


.052 


.189 


.229 


.051 


.049 


.048 


.001 




£4 
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Table 3 



Mean absolute error for estimates of item covariance matrix 



s 


N 


O' log a 


CT0 


O’ log i t 7 


^log a{3 


O’ log ot log it 7 


^73logit7 


20 


100 


.040 


.278 


.0070 


.149 


.024 


.122 




400 


.028 


.223 


.0066 


.136 


.017 


.125 




1000 


.026 


.218 


.0053 


.132 


.017 


.122 


30 


100 


.042 


.241 


.0070 


.141 


.024 


.123 




400 


.027 


.223 


.0068 


.132 


.017 


.122 




1000 


.027 


.215 


.0050 


.137 


.017 


.126 


40 


100 


.034 


.275 


.0058 


.134 


.020 


.110 




400 


.026 


.206 


.0055 


.116 


.016 


.105 




1000 


.025 


.207 


.0050 


.116 


.016 


.107 
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Table 4 

Mean absolute error in ability estimates 



n 


Type of 
CAT 


-2.0 


-1.0 


e 

0.0 


1.0 


2.0 


Standard 

Normal 


20 


1 


.560 


.285 


0.263 


.268 


.421 


.291 




2 


.438 


.256 


0.257 


.202 


.348 


.282 




3 


.557 


.292 


0.264 


.248 


.468 


.290 


30 


1 


.476 


.285 


.261 


.225 


.365 


.260 




2 


.364 


.240 


.257 


.153 


.275 


.230 




3 


.489 


.279 


.256 


.207 


.403 


.258 


40 


1 


.436 


.265 


0.255 


.175 


.307 


.223 




2 


.332 


.219 


0.255 


.132 


.248 


.204 




3 


.453 


.247 


0.264 


.152 


.341 


.234 
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